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I. AIMS OF CONDUCTING MULTI'- NATIONAL CQMPARISONS 
In May 1973 the first three reports from the Six -Subject Survey gonducted by the International 
Association for the Evaluation of Educational Achievemfnts (lEA) were published (Comber and 
Keeves, 1973; Purves, 1973; Thorndike, 1973). They reported evaluations of school education 
in 19 countries by drawing upon criteria in Science, Literature and Reading Comprehension, 
respectively. Within a short time,, the three remaining subject areas will also be report,ed, 
namely English and French as foreign languages and Civics (Lewis and Massad, ia press, 
Carroll," in press; Farnen, Oppenheim and Torney, in press). lEA in the first stage of its 
research on evaluation focused on mathematics, which wa& reported some years ago (IlusOn, lObTl. 

One could, indeed, ask about the rationale for embarking on a venture which has included 
250,000 students in 9,700 schools in 19 countries with Sw its far-reaching administrative 
implications and formidable technical complexities. When the lEA research was launched .sorue 
15 years ago, th^se who were involved simply wanted to take advantage of the international varia- 
bility with regard to both the outcomes of the educational systems and the factors which accounted 
for dif(erences in these outcomes. In a way, the world could be conceived of as 9ne big educational 
laboratory where different practices were experimented with in terms of school organisation, 
curriculum content and methods of instruction. But before trying to analy/.e cross-nationally tlie 
* effects' of various input factors on educational outcomes, it was necessai7 to devise internationally 
valid evaluation instruments. Not until the lEA research was launched did such instrumentb bucoinL* 
•available. Therefore the prime concern during the first years of lEA research was the conbtrucuon 
of appropriate* measuring techniques that^could result in the establishment of adequate international 
yardsticks. These were, indeed, badly^needed, not least for evaluating certain technical 
assistance programs in education in the LDC*s. .Pure 'head-counting', for instance enrollment 
and graduation statistics (see, e.g. Harbison and Myers, 1964), was used as a criterion of 

evaluation, "lacking qualitative indicators, «uch-as student competenoe»achieved4n-var-iou.s-..sul>.|uut — 

areas. The efforts at the beginning of the lEA reaearcli to devise iostruments by means of which 
international standards could be established unfortunately gave some people the fajse impression 
that the main purpose of the exercise was to conduct some kind of international horse race or 
•cognitive Olympics'. But the development of new evaluative techniques and the setting up^of an,, 
international cooperative machinery that we^t with it .was a prerequisite for establishing inter- 
national standards in a series of cognitive domains, such as Mathematics and Reading. Not until 
the lEA reading survey^ which also conjprised three LDC's (Chile, India and Iran), were any 
comparative assessments of the level of literacy aniong representative groups of students in suth 
countries aj?ailable. 
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Once measuring instruments wvre available, the next step was to identify the salient factors 
which accounted for cross-national differences. Since this could be done in a replicative way at 
the various levels of the single ndftional systems and across thes^sy stems, a much more miilii- 
faceted picture of factors accounting for differences in student attainment Ijetween .school .s>.sU'iiis 
could be obtained. The comparative approach implied that we widened The population of i la.s.srooms 
from one particular school within one particular national system to a representative set oi class- 
rooms within several national systems. Thus, lEA shared the ambitions prevalent m the social 
sciences in general, that is to say, to arrive at generali/able findings. By repeating survey^s and 
analyses over ni'any countries, which differed with regard to important social and economic- factors, 
a more detailed picture of what accounted for differences in 'productivity' l^etween these .systenjs 
could be arrived at. Since the ultimate aim of researcli in the social sciences is not only to identit> 
and desci^jbe but to explain and predict, that is to say, generalise, the basis for sueli an operation 
can be broadened by including inler-system and inter-country variables which allow cross -national 
gvnerali/ations iTnd also make it possible to^study how intra-system and inter-system vaiiabh».s 
interact. 

We can take as an illustration how class/si/.e is related to student |)erforman( e. Prat iieally 
all the sample surveys that so far have been conclli^c'ted have been tarried out in the United Male.s 
and some West European countries. M'hesG studies consistently indicate tliat cla.s.s si/.e ami . 
performance tend, to be positively combated at the level of 0. 10 to 0.20 (Marklund, 1002). I hc« 
fact, however, that class si/.e within these countries covens a rather narrow range malu .s ui »k rah- 
/ations about such a relationship awkward. In a multi-national stud> one can take into account 
variables such as teacher competence, school resources, and socio-economic structure, uliich 
vary widely between countries. This provides an opportunity for obtaining not only a more 
diversified descriptive picture but also for opening up new avenues of analysis. 

One overriding purpose of the lEA Six-.Subject Survey has been to stud^ the relationship 
between input factors in tlie social, economic and instructional dotuanis and output as ineasur-i'd l»\ . 
international tests covering both cognituc (student performance) and affective lK»ha\ lors (siudeiit 
attitudes and motivation). Tliese relati<jnships have been studied in some twenty national svsJeins o! 
education and, as a rule, at three different levels within each system. 

Anderson (1969) points out that the prime advantage in international cooperation "in 
e^Jucational research lies in overcoming undue generalizations or ' under-generali/.ations '^s well 
as distortive cultural bias. 

^ ^ "Scientific research in education, as in the behavioral sciences in general. 

is a search for empirically valid and theoretically interesting gencrali/titions 
about the behavior of human beings. This search is hamperbd by manv 
obstacles, not the least of which is the problem of cultural Ijias ond distortion. 
These problems are illustrated by two types of errors . . . one is tlie error 
of 'over-generali/ation'. We assume that what we discover to l)e true of 
• ^ learning-teaching behaviors of some part of human species is true of the 

beliaviors of all of the species, when in fact it is not." 

o 
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"a seco^nd error is found in our tendency to 'under>^generali/,c 
In this case we assume that \yhat^ve discover to be true of the 
behavior* of some given part of mankind is uniquely true of only that 
part, when in fact what is true of the part is also true of the whole. 
Thus, the search for reliable knowledge about tlie process of human 
education in large measure is a matter of progressively eliminating 
generaliv.ations which erroneously assume either more or less 
communality in our species^ learning-teaching behaviors than do in 
^ fact exist. "(Anderson, 1969, p. 144). . 

The replication aspect of cross-cultural research in education is also emphasi/.od by 
Gage (1963), who hopes that by advancing theory in education it might be possible to luenlify 'laws' 
or principles of teaching that would cut across subject areas, grade levels, and teaclicr calugone.s. 
One could in this context refer to tlie model of teaching advanced by Beeby (1966) which ib an 
attenipt to relate the level of development of formal schooling to the overall level of developnuMit 
reached % the nation where t!ie teaching takes place. Another illustration might be Flander'.s 
study of teaching behavior and student achievement in Minnesota and New Zealand whicli was 
conducted on the hypothesis that such 'laws' or principles could be identified (Flanders, 1970). 

After the completion of the lEA Mathematics survey (Hus^n, 1967a), two international 
meetings were held under the title "Toward a Cross -National Model of Educational Achievement 
in, a National Economy" (Super, 1970). The aim was to develop an input-output model »lial could 
serve as a more powerful theoretical framework for the next survey, where achievement criteria 
from six subject areas were going to be developed. Researchers from various social science 
disciplines were brought together to review both national and international researclt already uadcr- 
taken and to advance new hypotheses which could be tested in further rpsearch. They were alsu 
asked in this connection to suggest the inclusion of independent variables of a i^ocial and fconouuc. 
nature that should be included in the proposed survey. 

A key problem .in conducting cross -national evaluation studies, where comparisons 
made between student performance by nuans of standardi/.ed acliic\ eiucnt Jtests, lias to du with 
comparability per s6 (Hus6n, 1967b). Two major coniparability problems arc encountered: thi- 
drawing of strictly comparable samples of students and the construction of measuring instruinunls 
that are 'fair* in terms of their content matching the students' opportunity to learn thv subject- 
matter tapped by the tests. Thq^^technical aspects of these problems liave been .dealt with m 
detail in the lEA International reports (see, e.g. Peaker, in press; Comber and Kee\ es, 1973, 



page 42 et seq j. lEA has succeeded in establishing a system whereby national randoni saui'pluw, 
be they age samples or grade simples, can be drawn. Once the target populations have been 
defined '( e. g. , 14 -year-olds) and the sanipling design has been drawn up, such that each student has 
a specified non-zero chance of entering into the sample, the problem of executing the sample is 
mainly an administrative one. In several coj,intries, both developed and less developed, tiie 
conduct of the Six -Subject Survey was tlie first occasion when nationally representative sarnplfs 
of^students were drawn. The experiences gained in, for instance, countries like Iran aa(j India 
could be drawn upon in^th>^uture wlien procedures of evaluating entire national systeuis l>> moans 
of random samples are going to be established as routines. 
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One'tnain criticism levelled against the lEA mathematics study by mathematics 
educators in a special Lssue of the Journal for Research in Matheniaticb Education IFindloy, JUTl) 
was that there were q^onsiderable differences between countries in terms of the amount of 
exposure the students had had to teaching of the various topics covered by the items in tlit- intiM-- 
national mathematics tests. Truly**^ough, country means of teachers' ratings of ' opporlunxly to 
learn* and student achi.evement tend to be rather highly correlated over countries (see, e.g. 
Comber and Keeves, 1973, page 158 et seq. ). But it should be kept in mind that rank order 
correlations between country aggregates could be quite high, and they are indeed. Whun ^countries 
were correlated over item difficulties, it was found that the overlap in achiev enie;]t structure was 
remarkable, that is to say, country differences were only to a minor extent accounted for Ijy 
dramatic differences in particular topics or sub-areas within one subject but rather systematic 
differences over the whole range of items. At least in subjects like Mathematics and Science, 
'where the subject matter by its very nature is rather universal,' the differences bi^twi t n uational 
systems seem to affect all topical areas in a systematic way and not just a few. 

The construction of international achievement tests and tlie niachinery that went with il 
in a way served as a safeguard against undue cultural bias. An international coinnnttee was sut 
up for each of the^ubjeet areas included in the Six-Subject Survey. These committees, ijuing 
composed of subject matter specialists, teachers, test developers and curriculuni specuihsls, 
were responsible for the construction of tlie test instrunients and for the developniunt of question- 
naires related to their respective fields (see, e.g. Comber and Keeves, 1973, page 27 et sen). ). 
Contact with the participating countries was effected tlirough the National Researcli Centers and 
subject committees set up in each country. The analyses of the curricula, the proposing of item 
material and the try-out of the items were carried out in the participating countries. 'l\t^ lEA 
Headquarters served only as a co-ordinating center and a clearing liouse. 

Furthermore, since evidently the main purpose* of achievement tests is to measure 
differences in achievement, complete equality in terms of exposure to teaching and <jpporiutui> 
to learn would make the administration of such tests pointless. The same applies to ijnelli^eiue 
tests, where individual and group differences unavoidably also reflect differeno(\s in Jer'nKS ol 
opportunity. As has been spelled out in another connection (Ilus^n, 1967b), ,the adhunistiution 
of achievement tests internationally differs only in degree and not in principle frorii the adtmnis- 



tration of them nationally. IVithin ii given co^untry tiiere~'^re diTference's'beTwe school districts 
and regions due both to differences in student background and scliool're.sources. \ cry feu would , 
dispute the wot^thwhilene^^s of administering pie same test of achievement to all the children at 
the same grade level in a given country, once thj/e test measures the main oijjectives it is |Hir(jorted 
to measure. For instance, the finding within a giyen country tha^ children in urban areas perfurnf' 
better Ehan children from rural areas or that socially privileged have higher scores tjiun under- 
privileged is per s6 not to be interpreted as an act of discnmination against those who socially and 
pedagogically have been subjected to the less favorable conditions. The esta^blishnient of tho 
factual differences in terms of these criteria, once the latter have been agreed upon, is in itself 
of informative' value. It can, as in the case of the lEA research, serve as a basis for analysis of 
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what the factors^are that account^ for differences in performance and ultimately can be used fur more 
adequate educational policy. The data collected can alscTser/e as a basis for evaluating hou Ku* tlie 
students have been brought under the prevailing conditions and for ^alyses of what ccMild be dune iti 
order to improve these co*nditions. 

The rationale indicated above also applies to compairisons between highly -industriaU^ed 
and more or less agricultural ec.onomie>s, in brief to comparisons between developed and less- 
developed count rie,s (LDC). So far, no representative comparative information with regard 'u> 
student competence in LDC's has been available. Those who have first-hand experience liavi- - 
intuitively felt that'differences between students who grow up in countries where there is a U»jim 
tradition of literacy and those whose parents in most cases are illiterate, sometiincb arc bfnn Ui^. uUu- 

One might well raise the questio.n of the worthwhileness of an elaborate exercise like the 
one pursued by lEA to develop international standards of evaluation, considering the tremendous 
differences between the two categories of countries in terms of culture and tradition. Hut if tht- q 
goal in the LDC*s is to achieve 'modernization' , i. e. among other things, to bring about an 
infrastructure of knowledge ar^d skills conducive fo an economic development which ha?? led tu aUiuuiicu 
in the industrialized countries, then there is much to be said for attempts to measure, for example, 
basic reading skills and the knowledge in Science that is basic to the creation of a modern tocUnuiugj^ . 



II. ORGANIZATION OF INTERNATIONAL EVALUATION OF 
EDUCATIONAL OUTCOMES 

To conduct multi-national evaluation surveys is a cooiplicated tusk. A basic |ji equisili i.s llu 
setting-up of some kind of machinery that can secure the necessary co-ordination and «.,oiniiKUiiv aUui» 
between the participating research institutions. The national research centers have to take (lf<.iM»»u.N 
about subject areas and proble;ns ihoy want to investigate. A uniforn» design guiding the con.sli «k u»ih 
of instrument, data collection and data processing has to be laid down. A timetable for ail thus^.* 
activities has to be agreed upon. Since several languages are involved - in the Six-Subject hiu u/^ 
no.^less than 14 - problems of translation of tests and manuals of instruction have to be proi»o i^ 
handled. For instance, to what extent is it posf^^le^to avoid cultural biases when tests of i eaiiin^ 
coroprehension are constructed, translated, and given in vastly different cultural settmgs? his 



problem is a challenging research task in its own. It was^deaTt~wi'th in the^feasibilityTtiu^r^ - 
(Foshay, 1962) and was further elu^ctdated in the Six-Subject Sui vey when heading tests wei u ^ivun 
to students in three developing countries (Thorndike/ 1973). However, communication pioblem.s ate 
not solved by penetrating language barriers only. Differences in national values and habits caii v aii.si« 
'difficulties, not least with regard to promptness - or lack of promptn.6^ss - in responding to letti t .s 
sticking to timetables! v . 

Since lEA "Constitutes the largest network of co-operating reseaixh institutes conduclm>i 
eqipirical research in education in the world today, it would seem appj opriate to desci ibu bi icfh 
its organizational features. ' ^ 
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In 1959 a group of researchers from twelve countries, who convened under UNESCO 
auspices, decided to embark upon a small pilot stud)^ to examine to wh^t extent it wa.s feasible and 
meaningful to undertak^ multi-national 'standardj/.ed' survey research. The pilot study-;;,turiied out 
to be rather successful in both respects'. It was possible in a series of subject areas to construct 
achievement tests that could be translated and administered uniformly to students in different 
countries and to arrive at meaningful interpretations of between-country differences (Foshay, 11)02). 
It was administratively and .technically feasible to collect data uniformly and to have them processed 
in one place. Therefore, it was decided to undertake a more rigorous study using probability 
. samples from twelve countries, of which all were industriali/.ed (eiglit West European countries, 
theJFnited States, Israel, 'Australia, and .lapali). Student achievement in Mathuniatn.s uas VlioJ^eiK* 
as the criterion of output, since this subject by its universal nature seemed to be more rts/dily 
accessible to international comparisons than other subject areas, possibly with the exception ol 
Science., ^ . - v 

In the lEA Mathema\ics study two major levels in the school systems of the twelve 
countries were sampled (Hus6n, 1967): 

(a) 13-year-olds (both age and grade populations), since this was-the Usi » 
point in all the systems whore one hundred per cent of the relexant lij^u - 
group was still»in fulUtime schooling; and 

(b) pre-university grade students. 

In all 133, 000 students from 5,400 schools were tested and completed questionnaires in the 
Mathematics study. Furthermore, 13,500 teachers and 5,450 school prijicipals completed (luestion- 
naires with information on instruction, curriculunt, and school resources. Tlie information ^^aUiered 
in this survey was ased to test hypotheses concerning: (1) tlic^ relationship between different leachmu 
practices in school and outcomes of instruction; (2) the relationship between organi/.ation teatui:es ot 
the systems, such as age of school entr>, grouping practices, apd student-teacher ratio, to out- 
comes; and, (3) the relationship between home backgro~und and outcomes. Several special studies, 
for instance one on the relationship between the 'yield* and certain organi/.ational features 
(Postlethwaite, 1967), were also conducted. 

After the completion of the feasibility study and the first main study (in Mathematu-yv) the 

.A I - ^ 

participating research centers jn 1967 formed a corporate body. The main reason'for this was to 

eslablish JEA as a legal entity ^li'gible for research .grants.. -^i:huSr-IEA~is-now*^anMnt<-miationKti^ 

non-profit-making, npn-governntental association constituted under the name of the 'International 
Association for the Evaluation of Educational Achievement'. According to the statutes its pnueipal 
aims are: ^ 

Mii) to undertake educational research on an international scale: 

• (b) to promote research aimed at exartiining educational problems common 

- to many countries in order to provide evidence which can help in the 

improvement of educational systems; and 

1c) to provide, within the framework of the Association, means whevcl)y 

research centers, which are members ol the Association, can undertake 
co-operative?* projects, 

» « J, * 

10 . r . . 



12 



Organization of international evaluation of educational outtojiiies 



The Association is cdnsj^ituted in accordance with ihu Belgian law of 1919 rugta>d?ng - 
international non-profit-making, scientific societies, and which was modified by,a law of I9.j.|. 
lEA has from its inceR,tion had close relationships with the United Nations Educational, ScienUlAc 
and Cultural Organization (UNESCO). The feasibility study and the Mathematics survey wuro con- 
ducted undei* the auspices of the UNESCO Institute for Education in Hamburg, wiiere J.he lEA warUiug 
headquarters were located until 1969, when they were moved to Stockholm and are at pres«)nt 
accommodated within the Institute for the Study of International Problems in Education. lEA has 
a consultative relationship with UNESCO. 

Membership in lEA is restricted to institutions carrying out research in education. In 
order to be eligible for membership an institute should have a good reputation, qualified stail, ruady 
access to schools in the national school system, and the fiecessary financial resources to carry out 
the research work to which the institute has committed itself. Menibership is upon application 
decided^upon by the. lEA Council, which is made up of one representative from each National CLiiU»r. 
The nuniber of members is presently '23, consisting of ten West European countries (Finland, 
Sweden, Federal Republic of Germany, Scotland, England, Ireland, Netherlands, Belgiuui,, 1 raiu t^ . 
and Italy), three East European countries (Poland, Hungary and Ron^ania), and ten non«Eur,ofjcan\ 
countries (Israel, Iran, India, Thailand, Australia, Ne\v Zealand, Japan, Chile, and the United 
States). 

The Council m'eets, in principle, once a year and determines the general policy of the 
Association. It elects a Chairman and a Standing Comnnttee consisting of six of its members. 
T\\e Standing Committee elects two of its tn^&m^rs to serve witji the Chairman on the l^uneau, uhicth 
meets several times a year and is responsible for the execution of decisions taken by the Coiuu il. 
The center staff employed by lEA consists of an Executive Director, research officers, toi iiiu,^ai 
assistants and secretaries. During the Six-Subject Sctr oy two data pTrocessing. units \vuro c.slabli.siu-ii, 
one in New York for the first stages of processing and one in Stockholm fur further processing ami 
the statistical analyses. A data bank has been established at the University of Stockholm. 

. In conducting the Six -.Subject Survey, the Council had to establish various bodies f(ir 
ducting and reporting of research. As mentioned above, ^one intvirnational committee ui eaih t>ul.j»n J 
area in whic^li survey research is un^ertak^n is appointed 'by the Council^, Furthdr, the Council .sei 
_up a Technical Committee which wiis responsible Xqr oyera on^ tech nical p miiiem.s. 



pertaining to sampling, data collection, and data processing. The international committees nilvraci 
with national committees set up in the various subject areas. For exan*ple, during the lEA 
Subject Survey some 300 persons spread aci*oss 19 couptries with 14 different languages "were 
engaged m*the construction of instruments. During the ^Mathematics study IZnglish and Frencli were 
used as linguae operandi at international meetings and in correspondence, but in the Six-»Sub|ecl 
Survey it was decided to,use only English.- 

In the Six-Subject Survey 250,000 students, 50,0p0 teachers, and 9,700 schuoLs in some* 
20 countries were involved in testing and copipletion' of questionnaires. The data were niade avai- 
lable to the data processing center on eitheJr cards (in mojsi? cases) whiclj could be optically si aiuu ii 
(^>RC-cards), tapes or punched cards. The IVlRCi^card-readin^ took place in h>wa City, the eduing, 
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sorting, filing, item analysis and run-off of univariates was done in New York at Columbia 
University, and the bivariate and multivariate analyses" were conducted at the University of 
Stockholm. Data on some 2,000 variables were collected, most of these being input \ariablu.s. \Uv 
variables In an> 'one subject area at any one level of the school system amounted to between 200 and 
500. To be ^ure, there were loo many to be manageable in multivariate analyses and tlie> had to In* 
considerably whittled down on the.basis, of analyses oY the intercorrelation matrices. 

in. SOME MiUOR FINDINGS IN TIIE'IEA SIX-SUB.lECT SUIIVKY 

The following three target populations were sampled in the Six-Subject Survey: 

Population I - all students in full-time schooling aged 10*00-10:11 at the 
, -^^^ time of testing; 

Population II - all students in full-time schooling aged 14:00-14:11; 

Population IV - all student(^s in the terminal year in full-time secondary 

schoo\ programs which were either pre -university progriJm.s 
or programs of the same length (this gave the National 
Centers some latitude of interpretation, which means that 
in some countries only those students who are about to, 
complete courses which in a Harrow sense qnulify f<)i« ) 
university entrance were included, whereas in other counx 
■» ' tries those who are about to complete qualified vocational 
^ programs were aisy included). 

,It would indeed be preposterous -to try to condense the findings front the comprehen.si\ 

Si.\'Subject Survey intp a few pages. The report series will upon completion consist of nine \oluines 

We sliall therefore confine ourselves here to a presentation of some findings which seem to luivu *i 

particular bearing on the evaluation of education in LDC's, particularly since tliis is the first ^ 

occasion^. when qualitative comparisons between industriali/.ed and LDC's ha\e been made atturdin^ 

to agreed-upon international yardsticks. > 

Table 1, on the following page, i^hows the means and standard deviations in total Science 

score and total Reading Coniprehension score ill the 19 participating countries, of which fuur are 

mainly less developed. We have limited ourselves to these two cognitive criteria, since d*it*i on 

th^jm are available for four and three LDC^s ^respectively . The only.LDC which participati'd in 

/ o 

Literature was Chile, which also participated in English and Trench. Iran was the unly LDjC parti- 
cipating in Civics. ^ 



The most dramatic difference is the one between tlie induotriali/.ed and non-industnali/ ed 
countries. The latter are consistently far behind the former in average acliievement over sul*>,iec't 
areas and levels of schooling. la Science the LDC's score was roughl> one standard deviation or 
more below the more dqveloped. This means, then, that in Science tlie average student in *i MK 
scores between tlie 10th and I2th percentile in a developed country. The difference is even more 
pronounced in Reading Comprclictisiun, where only sonie 5 to 10 p^a- cent of the students in tlu- 
LDC'5 score at the level of the average strident in a more developvd country. Cliile particip*iti'd, 
as mentioned above, in the survey of French and ErtgUsh as foreign lahgua^es and Iran in ( i\tcs. 
The mean cognitive scores in both cases turned out to be on the relative level as in Scicnci- 

and 'Reading. * IS * ^ y 
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.Some major findings in the IKA Six-Subject Survey 



^ Table 1 

Mean Total Score and Standard Deviation {n Science and Reading 
Conprchentlon Among 10-Year-Olds, U-Ycar-01d«, and Pre- 
Unlver&ity Students in 19 Countries' ^ 



SCIENCE READIKC COMPREHENSION 

10-»ycar-oldt Wi-year-oldt pre-unlvcrslcy 10-year-olds U-ycaX^olda pre-unlversity 

students * students 









SD 


^ H 


SD 


}\ 


SD 


M 


SD 


M 


.SD 


M 


- Sl> 




Australia 






2A. 6 




24. 7 


10. .7 
















Selglun 


17. 9 


7.3 


21.2 


9. 2 


17.4 


0. i. 


17.5 


10. 2 ^ 


24.6 


9« 7 


25,0 


9,3 




(Fletalsh) 


• 
























Befcliim 


13.9 


7.1 


15. A 


8.8 


15.3 


7.9 


17.9 


9.3 • 


27.2 


8.7 


27,6 


9,7 






















-> 










England 


1>. 7 




21.3 


14 . 1 


23. 1 


11.5 


18* 5 


11.6 


J 


11.9 


J J. u 


»,n t 




Fed. Rep* Ccm. 


lA. 9 


7. A 


^J. / 


11. 5 


26. 9 


8.9 






- 










Finland 


17.5 


8.2<^ 


?0.5 


10.6 


19.8 


^^9.8 


19.4 


10.8 


27.1 


10.9 


30.0 


7,5 




, France 


~ 


— 




— 


38.3 


8.7 


— 


— 


— 


— 


~ 






Hungary • 


16.7 


S.O 


29.1 


^ 12.7 


23.0 


9.0 


14.0 


9.8 


25.5 


9,9 


23,8 


8.9 


•* 


Israel 






- 




— ' 




13.8 


11.0 


<22.6 


12.8 


25,2 


10.8 




Italy 


16.5 


8.6 


18.5 


10.2 


15.9 


8.8 


19.9 


8.8 


27.9 


9,3 


23,9' 


10.2 




Japan " * ' • 


21.7 


7.7 


' 31.2 


14.8 


















f 


Ncttfcr lands* 


15.3 


7.6 


17.8 


.10.0 


:!f3.3 . 


11.1 


17.7 


^•5 


25.? 


10,2 


11.2 


7.0 




New 7.eAl«md 








12.9 




- II. 6 






79,1 


11,0 


IS, 4 






Scotland 


1/..0 




21. /< 


14.-; 


.'3.1*^ 


t 

12.1 


18. 4 


11.1 


77.0 


ll.S 


(4,4 


M..» " 




Swi't'cii 




7. 1 


21.7 


, JJ.7 


'l'».7 


40. If 




10. '» 


7'i.(i 


10. K 








United SCJt^t4 


'17.> ' 




:^i.6 


11.6 


11./ 






11.6 


27. J 


Il,(i 


71. H 


l.\0 




Industrial ized , 


16.7 


7.9 


22.3 


1U8 


20.9 


9.9 














• 


CoutftrlQSx 




























Chile * ^ 


9.1 


8.6 


9.-2 


8.9 


8.8 * 


?6.0 


9.1 


9.3 


14.1 


11,1 


16.0 


8.8 




IndtA ' 


8.5 ^ 


8,3 


7.6 


'9.0 • 


6.0 


6.0 


8.5 


9.4 


t 

5.2 


7.2 




5.8 




Iran 


4.1 




7.8- 


56.1 


10.2 


5.6 

c 


3.7 


6.9. 


7.8 


6,7 


^,4 


6,0 




^Thailand^ 


, 9.9 


6.5 


15.6 


8.1 . ' 


12.4 


6.1 









































Thailand did not test national sample but sampled schools^ In the Bangkok area. 
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What explxinations cah be advanced for such big differences? In the first place, nuLst 
emphatically caution against any preinature conclusions about the 'productivity' or 'efficacs * of tlie 
school systems in the two types of countries on the basis of the mean scores presented in Table 1, 
The differences that we find between the industrialized countries are negligible iTi comparison with 
the gap between the two categories of countries. There is, however, no reason to believe that the 
rich countries with regard to their school systems all are on the same level of 'efficacy'. 

A first-hand explanation that would seem plausible is that the tests are not doing justice to . 
the children in the LDC's, The tests might draw uppn knowledge and learning experiences Uuit art^ 
more p^redominant in the rich countries. Furthermore, the test situation as such and the furnuU of 
assessing the outcomes of learning might imply a certain cultural bias. against students in LJ)C\s, 
We certainly cannot entirely refute such Itypotheses, but they do not get much support Irotii the 
empirical evidence we have. In the first place, the content of the tests, i,e,, the individual test 
items, went through a long procedure of scrutiny and try-out before they were 'passed' by all tlte 
national subject area committees and included in the international tests. Secondly, the rank order 
of difficulties of items tended to be highly correlated over countries, which indicates that differences 
in total scores between countries are not so much accounted for by differences in partix^ular .sul>- 
areas or topics of a particular subject as by , systematic differences in level of competence, TIk; 
teachers were asked to rate, on a four-point scale, each item in the tests with regard to what oppor- 
tunity the students in his or her. class had had to learn the subject, rnatter that was a.sses-scd 1^ the 
item, ,As far as Science is concerned the average opportunity tended to be somewhat lower for 
Populations II and IV in the LDC's (see. Comber and Keeves, 1973), lUit these differences ui 
opportunity can by no means explain niore than a small portion of the difference in mean :K'rfoi'iiutJu e 
^ The main factor is no doubt the socio-economic gap between the two categorie.s of ^ouuli ie.s, 

Educaticjn does not operate in^.a socio-economic vacuum wliich not the least is .shown by tlu^ion.sus- 
tently substantial correlations between various family Ijackground measures and student aciutvenjent 
in all subject areas, Passow, Noah and Eckstein (in press) have, in their report on the National 
Case Study Questionnaire, drawn up 'national profiles' for the IG^ouniries which participat^^d in the 
first stage of the Six-Subject Survey, The si/e of the per capitit GNP varies from about I , S, $1^400 
4,300 in the industrialized countries, whereas it varies from U,S, $90 - 270 in the LbC'.s which an* 
in the study. The si/,e of the non-primary s^sctor in per cent of the GNP is in most cases 00 lo X) 
per cent in the rich countries as. compared to 50 to 75 per cent in the LDC's, The different v us 
even more marked if we measure the si/q in terms of number of people einployed m the priiuar^s 
and non-primary sectors respectively, ^ ' , - . , 

Thus, the difference between developed artd less developed countries could be expeUed, 
considering the ov^^rall socio-economic setting for the school systenis in the two categories of 
countries. The outcomes of the multivariate analyses, which will be dealt with below, tell us tluit 
.the total effect of home background variables in both Science and Reading is greater than the total 
effect of all the school variables. Among (he 10-year *olds 35 per cent of the \ariation between 

students can be attributed to family background and 22 per cent to school factors, including, 
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of course, all the instructional facto^s. The corresponding figure for the 14-year-olcJs are 42 and 
26 per cent respectively. What is then ^family background'? After a careful study of some 20 
variables that could be considered as candidates for an overall measure of social background, aiit' 
following were selected to form a composite School Handicap Score (SHS): (1) Father's occupation, 
(2) Father's education, (3) Mother's education, (4) Use of dictionary at home, (5) Nuniber of books 
at homd; and (6) Family size. It is pointed out in the international report in Science, that the 
"effectiveness of the education provided by the school must be assessed by what is achieved, after 
allowance has been made for the nature of the community in which the school is operating", (Couiber 
and Keeves, 1973, fJage 195). Thus, regardlesf of the quality of the, formal educa.tional s> stent in the 
the LDC*s, we can, on the basis of the impact of the family background factors, predict a large 
difference in mean achievement between them and the more industrialized countries. Parent.s m 
the former type of cpuntries are in most cases-illiterate and no reading material is available at 
home. On the whole, the verbal environment in which the children grow up is almost entirely oral 
and there are rather few occasions in which reading skills picked up at school can be reinforct^d 
experiences at home. 

c 

A simple reading speed test was developed ixi order to measure to what extent tlie luccljanics 

of reading skills had been acquired. The items consisted of short paragraphs of two or three .smiple 

sentences, and the students by checking the right answer of a choice of three had to indicate that ju^ 

had understood what he had read. The items were like this: ♦ ' 

Peter has a little dog. The dog is black with a white spot on his bs^ck 
and one white leg. The. color of Peter's dog-is mostly 
black brown grey. " 

On the average, 10-year-olds in Europe had an error rate of about 10 per cent on items 

such as the one cited. At the 14-year-ol^ level the rate, had gone down to about 4 per cent. T t 

the three LDC's the rates were: 

10 -year -olds 14 -year -olds 

^ % , % 

Chile 26 16 

India 36 3*3 

- Iran 52 ' 20 * ^ 

therefore, there is some justification for doubts about whether quite a few of the 10- and 
14 -year -olds in the LDC's had been able to read the Science items and the questions in the student 
questionnaires. 



IV. THE RELATIVE 'EFFECT' OF HOME AND SCHOOL 

The Coleman study on ^'Equality of Educational Opportunity" (1966) was a massive attempt io 
disentangle the'unique 'effect' of the school as compared to the home on student achievement. 
Notwithstandiiig the doubtf^jl quality of the criteria of outcomes of instruction, such as a siiupU 
reading test which happened to be taken from a subject that, to a rather limited extent is 
'school^based', the study gave rise to an intensive technical debate with criticism of the causal 
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ordering of variables in the regression analyses. The limitations of using cross-sectional data for 
'effect-studies' of this type was also pointed out. The lEA Six-Subject Survey by and large falls 
victim to the same criticisms but can claim the following virtues* In the first place, the multi- 
variate analysis has been conducted over a series of national educational systems and at different 
levels of the systems. Furthermore, which seems more important, the study covers a wide array 
of subject areas, both those which a priori can be regarded as highly school-based, such as foreign 
languages and Science and those which are Jess school-based such-as ^Reading, 

In the first place, the total variance accounted for was consistently larger than in previous 
studies to which reference has been made in the debate on the relative effect of hume and school 
(Coleman, 1966; Jencks, 1972). Secondly, the school factors or learning conditions' at school 
tended to be of increasing importance as one moved from lower to higher levels in the system. 
Finally, Reading tended to differ considerably from Science and foreign languages in terms of the 
role played by the hon/e. As pointed out above, this would seem to be the main explanation why the 
gap between the LDC's and the industrialized countries is larger in Reading than in other subject 
areas. Thorndike (1973, p. 177) sums it up like this: 

"a dominant determiner of the outcome from a school in terms of ^ 
reading performance' is the input in terms of students that go to 
school. When the population of a school comes from homes in 
which the parents are themselves well educated, economically 
advantaged, and able to provide an environment in which reading 
materials and communications media are available, the school 
shows a -generally superior level of reading achievement. " 

Rank order correlations between means should, as indicated* above, be looked upon with 

suspicion and interpreted with great caution, since they tend to boost heavily relationships that <i re 

much'weaker at the level of the individual. But the following series of rank order correlations 

between mean achievement in Reading and various home bacll^groynd factors In the 15 countries 

which participated in the Reading survey casts some light on the statement quoted above and which 

was based on a broader spectrum of evidence: ^ 

Father's education 0,60 » 

Mother^education 0. 73 

Expected (own) education 0, 67 

Parents* help with homework 0, 50 

Parents' encouragement to read ' 0.56 

Number of books at home 0, 85 

Number of magazines at home ^ . • , 0. 71 

Hours listening and watching radio/fTV 0. 92 

For the lEA-Harvard Graduate School of Education meeting on the implications of the lEA 

findings Coleman (1973) collated the outcomes of the between -student analyses reported in Comber 

and Keeves (1973), Purves (1973), and Thorndike (1973) for the six countries which tested both 

10- and 14 -year-olds in all the three subjects which were covered in Stage H (Reaamg, Science and 

Literature), It should be mentioned that the Literature score refers to ihd ability to comprehend 

literary prose. A comparative study of the outcomes of the between -student multivariate analyses is 

as was pointed out earlier, of greatest interest because of its replicative nature. Parallel analyses 

hav6 been conducted in a variety of countries which provide^a broader perspective and facilitate 

meaningful interpretations, ^ ' * 
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Table 2 

Relative Contribution of Home and School Variables in Acjcounting for 
Between-Student Differenccji at the 10-Year and lA-Ycar Old Level 



Chile -England , Finland Italy Sweden U.S. Average 

10 lA 10 lA 10 lA 10 lA 10 lA 10 lA 10 lA 



Total Home Bacteround Effects 

Science 0.20 0.36 0.A6 0,A8 0.37 O.A? 0.20 0.32 O.AO 0.A2 0.A2 0.A7 0.3A . 0.A2 

Reading .0.12 0.A5 0.A7 0-52 0.A2 0.A5 0'.31 0.32 0.3A O.AO 0.A5 0.A7 0.35 O.AA 

Literature 0.38 — 0.50 — 0.A3 — 0.33^ — 0.39 — 0.A3 — 0.A2 

Total Direct School Effects 



Science 



0*30 0.26 0.18 0.30 0.21 0.3A 0.20 0.26 0.23 0.28 0.32 0.28 0.2A 0.29 



no 



Reading- . -0.29 0.28 0.13 0.19 0.18 0.23 0.22 0.19 0.18 0.18 0.21 0.28 0.20 0.2 
LlLcratuve — 0.32 — 0.22 — 0.26 — 0.18 — 0.26 — 0.30 — 0.:.0 



Source: Coleman (1973) 



It'has been indicated above that there is some consistency among the five more developed 
countries that home effects account for more than school effects* As far as the lO-year-olds are 
concerned that does not apply to CHiie. As can be studied in more detail in Professor Thorndike's 
report (1973, page .88 et se qj, the R-values and the per cent of added variance .for Block I in the, 
regression analysis (home background, age and sex) are much lovyer, particularly in India and Iran, 
than in the other countries. This indijcates a relatively greater importance of school factors in 
these countries as compared with the richer ones. 

\ 

V. EDUCATIONAL OPPORTUNITIES AND RESOURCE ALLOCATION 

Since information was available on parental occupation and parental education, a comparative btud^ 
could be made on the degree of equity that went into a national systeni, or, conversel^y, how p»io! itie.s 

were reflected in the social selection that took place when the students moved up to the pre.-uaiversijt^ 

/ 
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One overriding etlucational policjj problem in all LDC's has been to what extent and how 
fast formal school education should be made universal and how much education, 1. e. , how many 
years of schooling that could be provided to how many students. This problem can be resolved in 
a more egalitarian or a more elitist direction. It has, among educational planners in LDC's, often 
been advocated that in the long run the educational system would provide a better yield if the scarce 
resources were not spread.thin and (at least in theory) made available to all children at primary 
school age.^ One should give first priority to educating an elite which would then build up the mfra- 
structure needed for universalizing primary education in a remote future. 

In attempting to evaluate a national school system one or more of the following, criteria 
could be employed. One could try to assesa to what extent the system is taking care of the most 
able, the average and the less able students. One could look at the attrition rate in terms of 
grade-repeating and drop-out, which usually is very high in most LDC's. One could assess .student 
attitudes toward further learning and try to find out how motivated they are. One could follow the 
stwdg^nts up through the system and kssess how open or closed^the system is in terms of options 
between types of programs and tracks. One could try to measure the amount of social bias that 
goes into the processes of attrition and selection. 

The lEA data lend themselves to elucidate one major aspect of the problem of uniyersali- 
zation vs. elitism or selectivity, namely the amount of social bias that goes into the selection 
procedure and the standard of the elite in a selective as compared to a more comprehensive or 
universal system. 

By comparing the distribution of socio-econoniic status, as indexed by father's occupation, 
for the 10-year-olds with the one for the 14-year-olds and the pre-uniyersity students respectively, 
we can make aa estimation as to what extent the selection that operates from one level to another is 
correlated with social background. As far as the indu.strialized countries are concerned, the overall 
outcome of the analyses is this (Hus6n, 1973'). In national system's with high;iretention at thu 
secondary level selectivity on social basis is less predominant than in .systems with low retention 
rate and more strltt selectivity. On the basis of the proportion of upper and lower .stratum repre- 
sentation at the 14-year-olds level (when in the industrialized countries practically all children are 
still in full-time schooling) and at the pre-universit> level respectively, an index of social bias can 
be calculated (Hu36n, 1973). 'This index is unity when upper and lower strata have equal repre- 
sentation. It turns out to be 1. 3 for the United States and 2.4 for Sweden, two countries with 
relatively high retentivity (75 and 45 per cent of the relevant age-group still in school). Social bias 
in the enrollment at the senior secondary school in England is 7. 9 and in the Federal Republic of 
Germany as high as 37. 7, two countries where the retention at thpt level -is relatively low 
(20 and 9 per cent respectively). 

The students were, according tp parental occupation, classified in nine categories. The 
classification scheme, which ha^ originally been developed by the International Labour Orgiuazation 
in Geneva, could, however, not be employed uniformly over all the countries (Comber and Reeves, 
1973). Therefore, to the extent that the categorization has been consistent within tlie countries, 
comparisons can be made between various leyels of the system in terms of the social structure of 
enrollment. The high proportion of fathers with professional and clerical occupations in the LDC's 

18 

20 



Educational opportunities and'resource allocation 



Table 3 

Family Background (in per cent) iri Terms of Father's Occupation 
of Students at Various Levels of the Educational System in 
Chile, India, Iran, and Thailand 



Occupational 
Category 


10 


Chile 
Age 
lA 


pre- 
univ. 


10 


India 
Age 
lA 


pre- 
univ. 


10 


J ran 
Age 
lA 


pre- 
univ. 


Thailand 

Age 
10 lA pre- 
univ. 


Professionals and 
Managerial 




8 


19 


8 


10 


16 


20 


20 


2A 


3 


3 9 


clerical 


19 


21 


3A . 


24 


,22 


27 


. 19 


20 


19 


30 


33 3 A 


Skilled manual 


30 


2? 


^17 


52 


52 


.^3 


A6 


/AA 




A5 




Semi-skilled and 
Unskilled manual 


. 33 


28 


8 


6 


3 


5 


7 


6 


8 


18 


8 3 


Unclassified 


lA 


* 16 


22 


10 


13 


9 


H 


10 


7 


A 


8 2 


Total ' 


100 


100 


100 


lOO' 



indicate that those children who on the whole enter school are a socially select group. This seems 

to be the case, for instance, in Iran. This also explains why in Iran the social composition of the 

pre-university students does not differ very much from the one at the primary level. The most 

marked social selection takes place in Chile, which differs from the other three LDC*s in te; lu.s of 

the size of^ the non-primaij hector of the economy. The percentage of the upper stratu.s inci ea.sus 

from 4 to 19 when one moves from the 10-year-old to the pre-university level, at the .satne timo 

the number of semi- 'oy unskille^cl workers decreases from 33 to 8 per cent, 

» 

Evidently, when an evaluation is.made of the standard of the elite students one h,i.s to take 
into consideration wbat proportion of the relevant age-group we are looking at. It is pointloss tu 
limit the comparison to the mean performance at that level^ simply because we are dealing witli a 
highly variable porlioil of the age-group. Among the lEA countries it varied in 1970 all the \\a> 
Cr om 75 per cent in the United States to 4ess than 10 per cent in Iran. Therefore, it would be not 
only more *fair*, but also more informative to comparp equal proportions of the age-groups. Thi.s 
has been done in Table 4, where we present the means for the entire samples at the pre-universil^> 
level and the means for the top 9. 5 and 1 per cent of the entire age-group. The comparison 
between countries in terms of total Science test score is based on the assumption that tho&e who at 
this age level are not ir school would not have scored in any of the three top categorie.s had the> 
becii accessible for testing. ThGre are indications that in the industrialized countiies the iueans 
arrived at in the top groups would not hdve been significantly affected. This is even Uiore valid for 
the LDC»s. » > 
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Table 4 

Means and Standard Deviations for Science Test Score 
For Total Pre-University Sample and Equivalent Pro- 
portions of the Relevant Age Group 



Per Cent of Full Sample Top 9 fop 5 Top 1 

Age Group in. M SD per cent per cent per cent 

School M M M * 



Grand Total Score 
for Industrialized 
Countries 

Chile 

^India *• p 

Iran 

-Thai-land^— — - 



30 


22.0 


10.6 


32.3 


37.1 


A5.9 


16 


9.3 


6.3 


13,6 
< 


- 16.8 


23.5 


14 


6.3 


6.1 


9.5 


12.8 


20.8 


9* 


10.8 


5.9 


, 10.8 


1A.8 


21.9 


10 


12.5 


6.1 


13.6 


17. A 


23.2 



We notice in Table 4 that the average score for the total sample in all the industr.ialized 
countries is 22.0, which i& more than one standard deviation above the average for the four Ll)C*s. 
If we then look at the top 9 per cent and 5 per cent, we find that the difference becomes even piore 
marked. The top 1 per cent of the students in the pre-university year in the LDC's score at Ihe 
level of the average student in the industrialized countries. As far as Science is concerned the 
selection that has taken place in the LDC's from the lower to the higher level of the s^btcin docs not 
seem to have considerably increased the 'productivity' at tlie upper level of the sy.stem. 



VI. CONCLUDING REMARKS ' ' 

\\, is by no means a coincidence that international co-operative survey research in, education .started 
with evaluation problem.^. Before one can begin to investigate to what extent various typi^b uf factors 
account for differences between classrooxns, school and entire national systems of forma^ education, 
it is necessary to develop international criteria of evaluation. The construction of international tests 
that can be used in evaluating both the cognitive and npn-cogtiitive outcomes of instruction is in itself 
an important research -accomplishment. But it is only the first step on the way to the ultimate goal 
which is to identify the salient factors which account for differences between systems and to explain 
why they differ. By means of such research it will be possible to establish international indicators 
of the qualitative outcomes of school education. One would thereby also be able to inform planners 
and policy-makers about what indicators are worthwhile to manipulate in terms of policy action. 
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Closely related to this is the problem of how the 'productivity' of a national system of 
school education should be assessed. Too long have we tended to evaluate the outcomes in terms 
of the number of individuals who are enrolled at a paiiicular stage in the system or in ternis of hou 
many years they have con^pleted and not by the competence they have achieved. A, certain aiuuuiit 
of schooling in terms of number of years or a particular certificate can by no means be reg*ii ded as 
comparable quantities from one system to another. Furthermore, it is not satisfactory, when 
evaluating its quality, to limit oneself to the end-products of a system. One has also to consider its 
power tp take care of and impart competence in all^students who enter the system. Since attritiun, 
particularly in terms of drop-outs, in many systems is very high, one basic question that needs to 
be answered in evaluating a system is: How many students are brought how far? 
... As far the evaluation of national systems of education in the LDC's is concerned, the 

lEA research has brought about the accumulation of strategies and techniques which can btigin be 
^ utilized routinely, Methods of analyzing national curricula in terms of the goals which are tu be 
achieved have been developed. Similarly, techniques have been devised by means u/which iiisU la- 
ments can be constructed to measure these goals. Procedures for drawing probability samples 
from target populations under consideration have been developed. Routines for data cullectiun ii\ 
the *schools have been tried out in a wide variety of contexts. Finally, experiences have been gaincii 
i^^data processing that lend themselves to nation-wide evaluation surveys. 

The lEA international headquarters, as w^jII as the National Centers, have over tJie last 
ten years built up a considerable amount of collective competence with regard to the conceptualization 
of research problems connected with evaluation, the techniques employed, and the different modes 
of feedback to policy -makers in the countries concerned. The co-operative machiner^^ tluit has been 
buiil up could be utilized to provide training programs for students from regions of tlic world wlicre 
particular strengths and competencies in evaluation are still developing. From the ir*A intermitiniuil 
nel\vork it would Ue relatively simple to set up task forces to work with centers in LD'C's. Siu h 
forces could co-operate with local researchers on designing evaluation surveys. 

> 
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